On the consistency of AUC Optimization

نویسندگان

  • Wei Gao
  • Zhi-Hua Zhou
چکیده

AUC (area under ROC curve) is an important evaluation criterion, which has been popularly used in diverse learning tasks such as class-imbalance learning, cost-sensitive learning, learning to rank and information retrieval. Many learning approaches are developed to optimize AUC, whereas owing to its non-convexity and discontinuousness, almost all approaches work with surrogate loss functions. Therefore, the study on AUC consistency is crucial, and the previous study showed that classification calibration is necessary and sufficient for the consistency of AUC. In this paper, we show that, for pairwise surrogate loss of AUC, minimizing the expected risk over the whole distribution is not equivalent to minimizing the conditional risk on each pair of instances. We disclose that classification calibration is necessary yet insufficient for AUC consistency, and provide a new sufficient condition for the asymptotic consistency of learning approaches based on surrogate loss functions. Based on this finding, we prove that exponential loss, logistic loss and distance-weighted loss are consistent with AUC. Then, we derive the q-norm hinge loss and general hinge loss that are consistent with AUC. We also derive the consistent bounds for exponential loss and logistic loss, and obtain the consistent bounds for many surrogate loss functions under the non-noise setting. Furthermore, we disclose an equivalence between the exponential surrogate loss of AUC and exponential surrogate loss of accuracy, and one straightforward consequence of such finding is that AdaBoost and RankBoost are equivalent.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Consistency of AUC Pairwise Optimization

AUC (Area Under ROC Curve) has been an important criterion widely used in diverse learning tasks. To optimize AUC, many learning approaches have been developed, most working with pairwise surrogate losses. Thus, it is important to study the AUC consistency based on minimizing pairwise surrogate losses. In this paper, we introduce the generalized calibration for AUC optimization, and prove that ...

متن کامل

AUC optimization and the two-sample problem

The purpose of the paper is to explore the connection between multivariate homogeneity tests and AUC optimization. The latter problem has recently received much attention in the statistical learning literature. From the elementary observation that, in the two-sample problem setup, the null assumption corresponds to the situation where the area under the optimal ROC curve is equal to 1/2, we pro...

متن کامل

Herbal plants zoning using target detection algorithms on time-series of Sentinel-2 multispectral images (Amygdalus Scoparia)

Today, medicinal plants have a special place in the economy and health of a society. Due to the natural growth of many of these products, the necessity of zoning them for optimum and optimal utilization seems necessary. Traditional zoning solutions are not efficient due to their low accuracy and speed, therefore a new approach is needed. Remote sensing data have many applications in various fie...

متن کامل

An improved particle swarm optimization with a new swap operator for team formation problem

Formation of effective teams of experts has played a crucial role in successful projects especially in social networks. In this paper, a new particle swarm optimization (PSO) algorithm is proposed for solving a team formation optimization problem by minimizing the communication cost among experts. The proposed algorithm is called by improved particle optimization with new swap operator (IPSONSO...

متن کامل

Shape optimization of impingement and film cooling holes on a flat plate using a feedforward ANN and GA

Numerical simulations of a three-dimensional model of impingement and film cooling on a flat plate are presented and validated with the available experimental data. Four different turbulence models were utilized for simulation, in which SST  had the highest precision, resulting in less than 4% maximum error in temperature estimation. A simplified geometry with periodic boundary conditions is de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1208.0645  شماره 

صفحات  -

تاریخ انتشار 2012